31 research outputs found

    ESMC: Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint

    Full text link
    Large-scale online recommender system spreads all over the Internet being in charge of two basic tasks: Click-Through Rate (CTR) and Post-Click Conversion Rate (CVR) estimations. However, traditional CVR estimators suffer from well-known Sample Selection Bias and Data Sparsity issues. Entire space models were proposed to address the two issues via tracing the decision-making path of "exposure_click_purchase". Further, some researchers observed that there are purchase-related behaviors between click and purchase, which can better draw the user's decision-making intention and improve the recommendation performance. Thus, the decision-making path has been extended to "exposure_click_in-shop action_purchase" and can be modeled with conditional probability approach. Nevertheless, we observe that the chain rule of conditional probability does not always hold. We report Probability Space Confusion (PSC) issue and give a derivation of difference between ground-truth and estimation mathematically. We propose a novel Entire Space Multi-Task Model for Post-Click Conversion Rate via Parameter Constraint (ESMC) and two alternatives: Entire Space Multi-Task Model with Siamese Network (ESMS) and Entire Space Multi-Task Model in Global Domain (ESMG) to address the PSC issue. Specifically, we handle "exposure_click_in-shop action" and "in-shop action_purchase" separately in the light of characteristics of in-shop action. The first path is still treated with conditional probability while the second one is treated with parameter constraint strategy. Experiments on both offline and online environments in a large-scale recommendation system illustrate the superiority of our proposed methods over state-of-the-art models. The real-world datasets will be released

    MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

    Full text link
    Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The inference is also heavily constrained in terms of latency because producing a recommendation for a user must be done in about tens of milliseconds. In this paper, we propose MicroRec, a high-performance inference engine for recommendation systems. MicroRec accelerates recommendation inference by (1) redesigning the data structures involved in the embeddings to reduce the number of lookups needed and (2) taking advantage of the availability of High-Bandwidth Memory (HBM) in FPGA accelerators to tackle the latency by enabling parallel lookups. We have implemented the resulting design on an FPGA board including the embedding lookup step as well as the complete inference process. Compared to the optimized CPU baseline (16 vCPU, AVX2-enabled), MicroRec achieves 13.8~14.7x speedup on embedding lookup alone and 2.5$~5.4x speedup for the entire recommendation inference in terms of throughput. As for latency, CPU-based engines needs milliseconds for inferring a recommendation while MicroRec only takes microseconds, a significant advantage in real-time recommendation systems.Comment: Accepted by MLSys'21 (the 4th Conference on Machine Learning and Systems

    Co-design Hardware and Algorithm for Vector Search

    Full text link
    Vector search has emerged as the foundation for large-scale information retrieval and machine learning systems, with search engines like Google and Bing processing tens of thousands of queries per second on petabyte-scale document datasets by evaluating vector similarities between encoded query texts and web documents. As performance demands for vector search systems surge, accelerated hardware offers a promising solution in the post-Moore's Law era. We introduce \textit{FANNS}, an end-to-end and scalable vector search framework on FPGAs. Given a user-provided recall requirement on a dataset and a hardware resource budget, \textit{FANNS} automatically co-designs hardware and algorithm, subsequently generating the corresponding accelerator. The framework also supports scale-out by incorporating a hardware TCP/IP stack in the accelerator. \textit{FANNS} attains up to 23.0×\times and 37.2×\times speedup compared to FPGA and CPU baselines, respectively, and demonstrates superior scalability to GPUs, achieving 5.5×\times and 7.6×\times speedup in median and 95\textsuperscript{th} percentile (P95) latency within an eight-accelerator configuration. The remarkable performance of \textit{FANNS} lays a robust groundwork for future FPGA integration in data centers and AI supercomputers.Comment: 11 page

    Full connected neural-network for simulation of extantion in self-stressed monolitic slabs on ground

    Get PDF
    Желткович А. Е., Молош В. В., Пархоц K. Г., Савейко Н. Г., Юань Цзиньбинь, Чжэньхао Цзян, Чжэн Хаоюань. Моделирование перемещений в самонапряженных монолитных плитах на основании при помощи полносвязной нейронной сетиIn this article the strategy of interdisciplinary convergence of mechanics and artificial intelligence is illustrated. The article presents the results of calculating displacements in self-stressed monolithic slabs on ground obtained using a trained fully connected neural network. The empirical results of displacements in slabs on ground, displacements calculated according to the physicomechanical model, and obtained using a neural network are represented. The inspiration brought us to study neural networks modeling biological neural networks are follow: neural networks can autonomously detect patterns hidden in phenomena and can identify parameters on complex behavioral tracks of different physical systems. The authors describe in detail the developed and trained fully connected neural network

    Molecular Composition of Oxygenated Organic Molecules and Their Contributions to Organic Aerosol in Beijing

    Get PDF
    The understanding at a molecular level of ambient secondary organic aerosol (SOA) formation is hampered by poorly constrained formation mechanisms and insufficient analytical methods. Especially in developing countries, SOA related haze is a great concern due to its significant effects on climate and human health. We present simultaneous measurements of gas-phase volatile organic compounds (VOCs), oxygenated organic molecules (OOMs), and particle-phase SOA in Beijing. We show that condensation of the measured OOMs explains 26-39% of the organic aerosol mass growth, with the contribution of OOMs to SOA enhanced during severe haze episodes. Our novel results provide a quantitative molecular connection from anthropogenic emissions to condensable organic oxidation product vapors, their concentration in particle-phase SOA, and ultimately to haze formation.Peer reviewe

    A New Oversampling Method Based on the Classification Contribution Degree

    No full text
    Data imbalance is a thorny issue in machine learning. SMOTE is a famous oversampling method of imbalanced learning. However, it has some disadvantages such as sample overlapping, noise interference, and blindness of neighbor selection. In order to address these problems, we present a new oversampling method, OS-CCD, based on a new concept, the classification contribution degree. The classification contribution degree determines the number of synthetic samples generated by SMOTE for each positive sample. OS-CCD follows the spatial distribution characteristics of original samples on the class boundary, as well as avoids oversampling from noisy points. Experiments on twelve benchmark datasets demonstrate that OS-CCD outperforms six classical oversampling methods in terms of accuracy, F1-score, AUC, and ROC

    The prognostic significance of the neutrophil-to-lymphocyte ratio and the platelet-to-lymphocyte ratio in giant cell tumor of the extremities

    No full text
    Abstract Background In this study, the influence of the neutrophil-to-lymphocyte ratio (NLR) and the platelet-to-lymphocyte ratio (PLR) on the prognosis of giant cell tumor (GCT) of the extremities were investigated. Methods The clinical parameters of 163 patients who were diagnosed with GCT of the extremities between July 2008 and January 2018 were retrospectively analyzed. Optimal cutoff values of NLR and PLR were determined using receiver operating characteristic (ROC) analysis. According to optimal cutoff values, patients were divided into high NLR and low NLR groups or high PLR and low PLR groups. Kaplan-Meier and log-rank methods were used to compare the recurrence-free survival (RFS) between the high and low NLR groups, and between the high and low PLR groups. Univariate analysis was performed to determine the influence of age, gender, neutrophil count, lymphocyte count, platelet count, white blood cell count, tumor size, surgical approach and Campanacci stage on the prognosis of giant cell tumor of bone. The main predictors of RFS were determined by Cox multivariate regression analysis. Results The optimal cutoff value of NLR in giant cell tumor of the extremities was 2.32, which was used to classify patients into high and low NLR groups. The optimal cutoff value of PLR was 116.81, and was used to classify patients into high and low PLR groups. Campanacci stage, tumor maximum diameter, alkaline phosphatase, and C-reactive protein (CRP) were significantly associated with the high NLR and PLR. Cox multivariate regression analysis revealed that the Campanacci stage (HR = 3.28, 95% CI: 1.24~8.69) and NLR (HR = 4.18, 95% CI: 1.83~9.57) were independent prognostic factors for giant cell tumor of the extremities. Conclusion As a novel inflammatory index, NLR has some predictive power for the prognosis of patients with giant cell tumor of the extremities

    Distributed Recommendation Inference on FPGA Clusters

    No full text
    Deep neural networks are widely used in personalized recommendation systems. Such models involve two major components: the memory-bound embedding layer and the computation-bound fully-connected layers. Existing solutions are either slow on both stages or only optimize one of them. To implement recommendation inference efficiently in the context of a real deployment, we design and implement an FPGA cluster optimizing the performance of both stages. To remove the memory bottleneck, we take advantage of the High-Bandwidth Memory (HBM) available on the latest FPGAs for highly concurrent embedding table lookups. To match the required DNN computation throughput, we partition the workload across multiple FPGAs interconnected via a 100 Gbps TCP/IP network. Compared to an optimized CPU baseline (16 vCPU, AVX2-enabled) and a one-node FPGA implementation, our system (four-node version) achieves 28.95x and 7.68x speedup in terms of throughput respectively. The proposed system also guarantees a latency of tens of microseconds per single inference, significantly better than CPU and GPU-based systems which take at least milliseconds

    Transient analysis of LP rotor from NPP 900MW turbine

    Get PDF
    AbstractThermal stress and the contact stress for centrifuge force field during the start up and shut down is very most important for the safety of the turbine which will affect the design life of the turbine. The stress at startup and shut down is much larger than the stress at other conditions. The stress level and the fatigue life are important for safety and economy of the rotor. In this paper, the mechanical properties of the material varying with the temperature are considered. The vapor pressure and temperature at different position of the rotor and at different history are considered to calculate the film coefficient. The two dimensional thermal-mechanical coupled model is used to calculate the transient temperature field and stress field. The three dimensional contact model is used to calculate the stress field and contact stress under the centrifuge loading conditions
    corecore